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Abstract 

Background: Intego is the only operational computerized morbidity registration network in Belgium based on 
general practice data. Intego collects data from over 90 general practitioners. All the information is routinely 
collected in the electronic health record during daily practice. 

Methods: In this article we describe the design and methods used within the Intego network together with some 
of its basic results. The collected data, the quality control procedures, the ethical-legal aspects and the statistical 
procedures are discussed. 

Results: Intego contains longitudinal information on 285 357 different patients, corresponding to over 2.3% of 
the Flemish population representative in terms of age and sex. More than 3 million diagnoses, 12 million drug 
prescriptions and 29 million laboratory tests have been recorded. 

Conclusions: Intego enables us to present and compare data on health parameters, incidence and prevalence rates, 
laboratory results, and prescribed drugs for all relevant subgroups on a routine basis and is unique in Belgium. 
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Background 

General practioners such as William Pickles in the UK 
and Frans Huyghen in the Netherlands founded general 
practice-based epidemiology in the last century. They 
kept meticulous notes about patients' disease histories 
and other relevant information. They realised that gen- 
eral practice provided them with the opportunity to 
study morbidity in an unselected population over many 
decades [1]. 

Computerized patient records in general practice brought 
a significant change. Databases could be created with longi- 
tudinal data covering millions of patient-years. It became 
easier to study health parameters, diagnoses, laboratory re- 
sults, prescribed drugs and their interrelations. Many net- 
works were established each with their own accents. 

Intego is one of those general practice databases with in- 
formation from over 90 general practitioners and currently 
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covering 285 thousand individual patients and over two 
million patient-years. All information is routinely collected 
and derived from the electronic health record during daily 
practice. The network was founded in 1994 and data is 
sent to a central database at regular intervals. Its initial 
aim was to collect incidence rates of diagnoses presented 
to the general practitioner. Almost half of the Belgian gen- 
eral practitioners work in solo-practices, without add- 
itional staff and rather demand driven. Theoretically all 
patients can directly contact a specialist, although most 
patients will be referred to them by a general practitioner 
[2]. Specialists should inform the general practitioner by 
means of a digital letter. Patients are not registered in a 
particular practice, unlike in the Netherlands, where the 
general practitioner serves as a gatekeeper, but tend to 
visit the same practice over time. 

The gathered information is used as a basis for teach- 
ing, quality improvement interventions, and research, as 
well as for policy making by both physicians' organisa- 
tions and governmental bodies. Over the years, the net- 
work's aims broadened and now also include the natural 
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history of diseases, etiological and prognostic research, 
evaluation of population-based interventions with re- 
spect to prevention and quality improvement and rela- 
tions between diagnostic categories, laboratory results 
and prescribed drugs. The network is mainly funded by 
the Health ministry of the Flemish government. Ad hoc 
research projects serve as an additional funding source. 

In this paper we describe the design and methods 
used within the Intego network together with some of 
its basic results. 

Methods 

Data collection 

At the start of the Intego-project it was decided to use 
the medical electronic health record - Medidoc" " which 
is one of the few existing packages in Belgium allowing 
routine input of coded diagnoses and other coded data. 
All data are recorded by the general practitioner using 
keywords in a predetermined field in the electronic 
health record. Each of the 67500 keywords is associated 
with a unique program-specific internal code and can be 
linked to classifications as ICPC-2, ICD-10 and ATC- 
DDD (Table 1). 

When a general practitioner agrees to participate a 
software module copies the data from the general practi- 
tioner into a text file. For apparent privacy reasons each 
patient is assigned a random number. This file is then 
encrypted and sent to an independent trusted third party 
that recodes the data and subsequently sends it to the 
department of general practice of the KU Leuven where 
it is stored into the central database. The trusted third 
party procedure has been in place since 2012. 

Quality of the data 
Selection procedure 

General practitioners willing to participate in Intego 
have to pass three general quality criteria. First, the 
average number of new diagnoses per patient per year 
should be higher than one. This number was chosen 
based on a graphical presentation of the average number 
of new dagnoses per year, which showed a clear cutpoint 
at one. This number corresponds with what was found 
in the Nijmegen Continuous Morbidity Registration [2]. 
Secondly, the percentage of diagnoses recorded without 
using keywords should be less than five percent. Finally, 
these parameters must remain stable for at least three 
years. With these criteria, in particular the first one, we 



intend to minimize the risk of recording bias, where general 
practitioners only register certain, e.g. serious, diagnoses. 

Collected data 

The following data are copied from the general practi- 
tioner's electronic health record into the Intego database: 
year of birth, gender, the years that the patient contacted 
the practice, diagnoses, drug prescriptions, laboratory re- 
sults, some biomedical parameters such as blood pres- 
sure, height, weight, smoking status and mortality. Other 
lifestyle information, such as nutrition, is not recorded. 

Once the data are in the database a check is performed 
to see whether the data are as correct and complete as 
possible [3-5]. With regard to registration networks, cor- 
rectness refers to whether a patient with a diagnosis in 
the electronic health record also suffers from that par- 
ticular disease. Completeness refers to whether the rec- 
ord of a patient with a certain disease contains that 
diagnosis. Validation of a network can be performed by 
means of external (comparison with other registries or 
studies as a measure of completeness) and internal 
methods (tracers as a measure of correctness). Tracers 
are known relationships between background character- 
istics, gender, age, laboratory test results or medication 
and disease: e.g. patients receiving insulin treatment 
should also have a registered diagnosis for diabetes. 

External validation of the Intego database has been ex- 
amined by means of national and international compari- 
sons. Nationally, overall cancer incidence was compared 
to the Limburg Cancer Registry (LIKAR). Our incidence 
rates of influenza and acute respiratory illness were com- 
pared to the European Influenza Surveillance Scheme, 
EISN [6] . Data found in Intego are routinely compared to 
the Tweede rationale studie, studying morbidity and care 
in Dutch general practice [7]. Serious infections in chil- 
dren were compared to data from other networks [8]. 
Several diseases were compared to the Dutch CMR 
(Nijmegen) and RNH (Maastricht). 

Other validation methods are either irrelevant in these 
systems (for example comparison with paperbased data) 
or impossible (for example merging with other databases 
such as cancer registries) because patients cannot be 
identified due to privacy reasons. The feasibility of link- 
ing patient data from other sources (e.g., hospital) with 
Intego data by means of a trusted third party is currently 
under study. 



Table 1 Notation of diagnoses in Medidoc 



Free text Keyword Program code ICPC-code ICD-9-CMcode ICD-10 code 

Complex fracture of leg — > L73 
Fracture tibia diaphysis closed — > Q82320 

Q82320 -> L73 -» 823.20 -> S82.20 
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Analysis method 
Basic analysis 

Intego can provide data about incidences and preva- 
lences of diseases, distribution of risk factors, time 
trends, age- and gender differences and drug use. 

Incidence is calculated as the number of new cases of 
disease divided by the person-time magnitude [9]. Inci- 
dence rates can be provided for the total population, or 
stratified by age group and/or gender or some other ad 
hoc defined subgroup and can be standardized for the 
Flemish, Belgian, European or World population. The 
denominator is the yearly contact group (YCG), patients 
which visit the practice at least once in a given year or 
the practice population, all patients in a practice, con- 
sisting of the YCG plus the group which does not visit 
their general practitioner in a given period. This last 
group is estimated from all non-profit health insurance 
agencies, which includes complete data on all reim- 
bursed medical and paramedical acts, age, gender and 
residence as well as the total insured population. A 
yearly correction factor is calculated, to estimate the per- 
centage of patients consulting a general practitioner by 
age group, gender and region and hence the practice 
population. 

The prevalence of a population is the proportion of 
the population with the disease at a specified time. Un- 
like incidence rates, which focus on new events, preva- 
lence focuses on existing states. Because of the design of 
Intego (no episode registration and no recording of cure) 
prevalence rates can only be calculated on incurable 
chronic diseases, such as diabetes. 

Gender differences are usually expressed as sex ratios, 
which represent the incidences in female to male rate 
(or vice versa). Age is divided into Wonca-age groups or 
analysed as a continuous variable. 

Identifying rising and declining trends in disease and 
other medical data is an important part of epidemiology. 
Environmental changes, prevention, public health action, 
behavioral changes etc. can all result in disease incidence 
changes. The Intego database can provide data from 
1994 until now and can study time trends or seasonal 
effects in diseases using standard time series analysis 
techniques [10], [6]. 

Medication is coded according to the Belgian coding 
system CNK (national codes) which can easily be trans- 
lated into the ATC-DDD format. These data can be 
used for pharmacovigilance and pharmacoepidemiologi- 
cal purposes. 

Design 

Many study designs can be used, such as incidence stud- 
ies, retrospective cohort studies, case-control studies and 
cross-sectional designs in an unselected patient group 
or subgroup. For each individual study, the sample 



population is chosen according to specific inclusion and 
exclusion criteria depending on the aim of the study. 

Statistics and data analysis 

When a research question is formulated, and the use of 
morbidity registration data is deemed relevant to answer 
this question, a protocol is written, data is collected, sta- 
tistically analyzed and results are summarized. There are 
many approaches for making inferences from observa- 
tional studies, some focusing on the data collection 
(design), others on the statistical analyses. However, even 
with the best designs, observational studies do not auto- 
matically control for selection bias. Therefore matching, 
standardization, stratification and/or covariance adjust- 
ment are needed [11]. The purpose of these techniques 
is to create index and control groups eliminating bias as 
much as possible. Matching and covariate adjustment 
are most often used. For covariate adjustment the vari- 
ables that should be balanced (mostly at least age, gen- 
der and practice) are put explicitly in the statistical 
model and this way adjusted for. This method could 
however produce biased results if there is an additional 
imbalance in background variables which are not stated 
in the model (e.g. smoking or other confounders). 
Matching is based on at least gender, age (+/- 5 years) 
and within practices and constitutes most likely of a 5 to 
1 ratio [12]. Standardization can be performed with any 
relevant population, including one of the comparison 
groups. Most frequently used are the standard Flemish 
or European populations. The latter is most popular 
in cancer related epidemiological studies [13]. Statistical 
techniques which are often used are ordinary least squares 
regression, logistical regression, and survival analysis. 

An internal manual for analyzing repeatedly intro- 
duced data, such as laboratory (e.g. HbAlc) or clinical 
data (e.g. blood pressure) was constructed in order to 
standardize analysis procedures disposable for all re- 
searchers using the database. For instance, in some lon- 
gitudinal analyses, we aggregate individual values on a 
yearly basis. For LDL-cholesterol and HbAlc, we take 
the last value of each year, but for creatinine e.g. we take 
the average value of the last two measurements of each 
year in order to account for the important within subject 
variability of this variable [14]. 

Legal ethical aspects 

At the start of the project a proposal was submitted to 
the Belgian Privacy Commission. No comments were 
provided by them. The procedures were also reviewed 
and approved by the ethical review board of the medical 
school of the Katholieke Universiteit Leuven [15]. More 
recendy, the Sectoral Committee 'Health' of the Belgian 
privacy commission thoroughly reviewed our procedures 
and asked for the use of a third trusted party for additional 
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recoding of the original coded data and for the installation 
of a scientific and ethical review board. These boards con- 
sist of researchers, both from our department and from 
other universities, Intego registering practitioners, a non- 
medical reviewer and an ethicist and lawyer. They assess 
whether the questions are ethically correct (Ethics Board) 
and scientifically sound (Scientific Board). They have an 
advisory function, comparable to a Data Safety Monitoring 
Board in clinical trials. After these improvements our pro- 
cedures are compliant with the law confirmed by decision 
nr. 13.026 of March 19 th , 2013. 

Results 

General practitioners 

Between 1999 and 2012, 144 different practices provided 
their data. Based on the criteria specified above, 51 prac- 
tices (representing 95 general practitioners) were se- 
lected for inclusion in the database. The practices are 
spread throughout Flanders, but are not selected to be 
representative for the Flemish general practitioners. The 
general practitioners in the project represent slightly 
fewer women and are slightly older than the Flemish 
general practitioners overall. 

Patients 

The database contains information on 285 357 different 
patients, with 2 million patient-years for 1994-2011. In 
the year 2011, 115 303 of these patients contacted their 
general practitioner. This corresponds to an estimated 
practice population of 143 495, which is 2.3% of the 
Flemish population. The Intego-population is represen- 
tative for the Flemish population in terms of age and 
sex (Figure 1). The largest difference is 3.2% in the 
age group 25-44 years. In all these patients 3 million 



diagnoses, 12 million drug prescriptions and 29 million 
laboratory tests have been recorded. 

Data 

The proportion of new diagnoses in each ICPC-chapter 
is stable over the years (Figure 2) with respiratory disor- 
ders making up the vast majority of diagnoses (33% of 
all new diagnoses in 2011). These are followed by mus- 
culoskeletal disorders with 18%, gastro-intestinal disor- 
ders with 13% and skin diseases with 10%. These four 
chapters together constitute 74% of all new diagnoses. 
The 13 remaining chapters each represent less than 4% 
of all new diagnoses. These proportions correspond well 
to the 20 most common diagnoses (Table 2), with infec- 
tions of the respiratory tract and diseases of the loco- 
motor apparatus at the top of the list. 

A typical analysis dataset contains event dates for all 
variables under study and patient characteristics (e.g. 
gender and year of birth). 

Basic incidence rates are calculated by 1000 or 100,000 
person-years and stratified by gender. For example the 
incidence rate of malignant melonoma is 8/100,000 
patient-years overall, and 6.5 and 9.7/100,000 patient- 
years for males and females. In many cases results are 
also presented by age groups. Prevalence rates can be 
calculated for chronic conditions. 

Because all prescriptions are registered in the electronic 
health record these data can be used for studies on drug 
prescriptions (e.g. the relationship between atypical anti- 
psychotic use and subsequent risk of diabetes) [16]. 

Several international peer reviewed articles have been 
published using data from the Intego database. For 
example, Intego produced antibiotic prescribing indica- 
tor values. These values reveal huge opportunities to 
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Figure 1 The Flemish population and the population of the Intego-database by age. 
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Table 2 The 20 most frequent new diagnoses in 2011 



Female Male 



ICPC Code 


Diagnosis 


Total number 


%o 


ICPC Code 


Diagnosis 


Total number 


%o 


R74 


Upper respiratory infection acute 


40391 


233 


R74 


Upper respiratory infection acute 


36248 


226 


R80 


Influenza 


13016 


75 


R80 


Influenza 


13151 


82 


D73 


Gastroenteritis presumed infection 


12305 


/I 


D73 


Gastroenteritis presumed infection 


12060 


IS 


R78 


Acute bronchitis/bronchiolitis 


10288 


59 


R78 


Acute bronchitis/bronchiolitis 


10230 


64 


U71 


Cystitis/urinary infection other 


8080 


47 


L03 


Low back symptom/complaint 


7345 


46 


L03 


Low back symptom/complaint 


7427 


43 


L87 


Bursitis/tendinitis/synovitis nos 


5583 


35 


L83 


Neck syndrome 


6987 


40 


S16 


Bruise/contusion 


4702 


29 


R75 


Sinusitis acute/chronic 


6454 


37 


H71 


Acute otitis media/myringitis 


4409 


28 


L87 


Bursitis/tendinitis/synovitis nos 


5992 


35 


L83 


Neck syndrome 


4148 


26 


H/l 


Acute otitis media/myringitis 


4434 


26 


R75 


Sinusitis acute/chronic 


3699 


23 


R76 


Tonsillitis acute 


4225 


24 


R76 


Tonsillitis acute 


3647 


23 


S16 


Bruise/contusion 


4166 


24 


R77 


Laryngitis/tracheitis acute 


2803 


17 


D87 


Stomach function disorder 


3987 


23 


S74 


Dermatophytosis 


2615 


16 


R77 


Laryngitis/tracheitis acute 


3786 


22 


D87 


Stomach function disorder 


2547 


16 


L92 


Shoulder syndrome 


3002 


1/ 


L92 


Shoulder syndrome 


2519 


16 


R05 


Cough 


2526 


15 


R05 


Cough 


2242 


14 


L86 


Back syndrome with radiating pain 


2450 


14 


D84 


Oesophagus disease 


2053 


13 


D84 


Oesophagus disease 


2423 


14 


L86 


Back syndrome with radiating pain 


2050 


13 


A04 


Weakness/tiredness general 


2373 


14 


K86 


Hypertension uncomplicated 


1873 


12 


S74 


Dermatophytosis 


2183 


13 


S18 


Laceration/cut 


1762 


11 
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improve the quality of antibiotic prescribing, especially 
the prescription of recommended antibiotics [17]. Next 
to this, it was found that chronic kidney disease is highly 
prevalent in patients with type 2 diabetes mellitus and 
that only a minority of patients evolve into severe de- 
cline which is associated with younger age, male gender, 
a year-to-year decline of >10 mL/min and manageable 
factors such as blood pressure, blood glucose, associated 
drugs prescriptions including statin therapy [14]. An- 
other example is low back pain (LBP). Most studies on 
comorbidity in LBP have been conducted in specialized 
settings with the use of self-reports. Low back pain is 
one of the most frequent diagnoses in general practice. 
Striking is the higher frequency of common self-limiting 
diseases in patients with a diagnosis of LBP during the 
same year [18]. In another study a moderate significant 
association between herpes zoster and subsequent can- 
cer risk in women older than 65 years was found, with- 
out any influence of antiviral therapy. No association 
was found with herpes simplex [19]. 

Quality of the data 

We continuously compare our results with results from 
other registries and groups. Up to now, there is no gold 
standard to measure the quality of registered data. Regis- 
tering practitioners have to pass the above mentioned 
quality criteria. It is however not clear to what extent 
these criteria really increase the quality of gathered data. 
It is also unclear whether supplemental criteria can be 
useful to increase the quality. External validation of the 
data with other registries compares similar databases 
which may suffer from the same biases. A comparison 
with registers such as the NIVEL primary care database 
or IPCI database in the Netherlands or the Health Im- 
provement Network (THIN) database in the UK might 
be interesting for this purpose, since they use both 
coded and free text data. All are large longitudinal gen- 
eral practice research databases which are based on 
routinely collected data of a non selected population. 
Comparison to data from databases where all general 
practitioners who want to participate can participate in 
Intego which selects their general practitioners based on 
stringent registration quality criteria might provide use- 
ful insights to our quality criteria. 

With regard to cancer incidences, Intego and LIKAR 
respectively registered 3.83 and 3.63 cases of invasive 
cancers (exclusion of basal cell carcinoma) per 1,000 pa- 
tient years [20]. Incidences of melanoma (8 versus 8.55 
/ 100,000 patient years), lung (12 and 79 versus 15 and 
79/100,000 patient years for females and males) and cer- 
vical cancer (9 versus 9.13 / 100,000 patient years) were 
also very comparable [21,22]. Comparisons on transient 
ischemic attacks (TIA) and stroke were performed with 
the general practice sentinel network (Belgian Scientific 



Institute for Public Health), resulting in comparable inci- 
dences for stroke and higher incidences for TIA in 
Intego (1.93 versus 1.73 for stroke and 0.89 versus 1.29 
/ 1,000 patient years for TIA) [23]. In preparation of the 
HjNi vaccination campaign the incidence of Guillain 
Barre was examined by comparing the number regis- 
tered in the database for the most recent years and num- 
bers based on recollection by the general practitioners, 
assuming every general practitioner remembers the cases 
in his/her practice because of its low incidence and its 
dramatic clinical picture. Both were in very high agree- 
ment, meaning around 1/100,000 patient years and 
higher than a specialist (neurologists) registration. Com- 
parison with the EISN showed high agreement with 
regard to Influenza like illness and acute respiratory in- 
fections [6]. 

The yearly incidence of serious infections in children 
aged 0-4 years in the Continuous Morbidity Registra- 
tion (CMR) is 27-30/1,000 patients, 18.3 in the Dutch 
Transition project, 18.1 in Intego and 15.7 in the Weekly 
Returns Service (the latter is based on children from 
1-4 years old) [8]. Several comparisons with CMR and 
RNH resulted in similarities and differences but for none 
of the networks a specific direction was found with 
regard to the differences. The eHID project, with the 
objective to investigate the operational features of net- 
works providing epidemiological information based on 
the extraction of routinely collected health related data in 
order to make recommendations on best recording prac- 
tices, compared disease frequency in several European 
member states. The Intego data were comparable to other 
networks [24]. 

Internal validation can be performed using tracers. For 
diabetes and anti- diabetics 88% of the people who were 
prescribed anti-diabetics (ATC, A10A, A10B and A10X) 
also had a diabetes diagnosis. With regard to dementia 
(ATC, N06DA), migraine (ATC, N02CC) and Parkinson 
(ATC, N04A and N04B), associations were lower, re- 
spectively 52%, 69% and 36%. Quality improvement pro- 
jects are currently ongoing to study these results. 

Discussion 

Strenghts 

Computerisation provides relatively inexpensive and easy 
access to large volumes of data. They are population 
based and can quickly produce large samples of patients. 
Because of the longitudinal nature of the data, many re- 
search questions can be answered in a more time and 
cost effective manner compared to other study designs 
(e.g. newly set up cohort studies). 

The most frequently used design is the retrospective 
cohort design which enables researchers to analyze a 
large amount of data collected and carefully stored in 
the past, while analyzing them using a cohort design. 
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The design enables researchers to use information col- 
lected in tempero non suspecto, without any hindsight 
of the objectives of the actual study and avoiding the 
main risks of bias in prospective study designs, especially 
selection and recall bias. Thus, the retrospective design 
guarantees that the measurement of predictor variables 
was not biased by knowledge of which subjects had the 
outcome of interest. In a prospective design, knowledge 
of exposure status may bias classification of the out- 
come. Also, in a prospective design, being in the study 
may alter participant's behavior. Since data in a registry 
are routinely collected, this inclusion bias does not count 
for retrospective study designs. Compared to a case- 
control design, an advantage of the retrospective cohort 
design is that all of the subjects who developed the out- 
come (cases) and all those who did not (control) come 
from the same population. 

A major strength of these types of cohort studies in 
general is the possibility to study multiple exposures and 
multiple outcomes in one cohort. Even rare exposures 
can be studied. The combined effect of multiple expo- 
sures on disease risk can be determined. Hypothesis gen- 
eration is another benefit of cohort studies considered as 
a way to pick up associations between many exposures 
and outcomes. Yet because of lack of randomization, co- 
hort studies do not permit conclusions about causality. 
Several underlying etiological hypotheses can be gener- 
ated, to be tested in other confirmatory studies. 

Deckers et al. formulated minimal criteria for a pri- 
mary care network [25]. Computerisation provides rela- 
tively inexpensive and easy access to large volumes of 
data. A sufficient sample size is advised to be about 1% 
of the population, which allows the study of common 
diseases. Increasing the sample size much more will lead 
to an increased workload, but is not expected to result 
in additional information. Intego covers more than 2% of 
the Flemish population, highly representative for age and 
gender. Data on age is collected on a continuous scale and 
can be divided into as many age groups as needed. Data 
from the previous registration year are collected in the 
first half of the year. Although data are not collected on a 
weekly basis, they can be reconstructed in retrospect to 
obtain weekly, monthly or even daily information. 

Because Intego is based on routinely collected data 
and the study population is selected with broader inclu- 
sion and less exclusion criteria compared to an RCT, its 
results may be more generalizable to clinical practice. 

Weaknesses 

Some variables, which may be important confounders in 
public health research are not measured (occupation, 
employment, and socioeconomic status), measured im- 
precisely, or even unknown, such as smoking or mortal- 
ity, which is only registered partially. Information from 



specialists as well as events that occur in hospital may not 
be fully captured in the electronic health record. Over the 
counter medications and treatments given in hospital are 
not readily available. Exposure status (e.g. onset of treat- 
ment, exact year of diagnosis) may be missed because it 
has occurred prior to start of the registration. 

The most ill can die, others can be lost to follow-up 
because they moved. This will bias the results when se- 
lective follow-up rates differ between index and refer- 
ence group. Loss to follow-up can be related to future 
outcome. The Healthy Survivor Effect (HSE) can be de- 
scribed as a continuing selection process in the cohort 
due to survival or maintenance of the healthiest individ- 
uals, whereas survival/maintenance process may differ 
amongst the selected groups (e.g. diabetes or not). The 
study will include only patients remaining in the system, 
a survivor (healthier) population. HSE can lead to lower 
than expected outcomes, can interact with exposure vs. 
outcome associations between groups (e.g. statin effect 
in longitudinal design on elderly diabetic patients with 
and without CKD). When there is no exposure-disease 
relationship, higher cumulative exposure appears pro- 
tective of health. The effects of healthy patient effect 
biases may vary by gender, race/ethnicity, social class, 
work status, age at inclusion, length of follow-up or 
cause of mortality/morbidity. 

Differential misclassification can lead to an overesti- 
mation or underestimation of the effect between expos- 
ure and outcome. Classification of individuals (exposure 
or outcome status) can also be affected by changes in 
diagnostic procedures. 

Intego is based on routinely collected data. Does the 
general practitioner make a correct diagnosis or assess- 
ment? This refers to the discussion about diagnosis 
(cough, cold,...) in primary care, but also to the problem 
of episode registration and changing diagnoses (ex. Myco- 
plasma pn. Infection). Criteria based diagnoses are more 
accurate than symptom based diagnoses. Second, has the 
diagnosis/assessment been correctly coded in the elec- 
tronic health record? This question refers to the sensitivity 
(or completeness) and the positive predictive value (or 
correctness) of electronic health record -based data [26] . 

Although the patient population is representative for the 
Flemish population, registering general practitioners are 
not representative for the general practictioner population. 
It is a selected group of high quality registering practi- 
tioners which use a specific electronic health record. This 
selection bias of general practitioners could eventually 
have an influence on some process parameters in the 
follow-up of patients. 

Difficulties arise with tracers, when some drugs are 
prescribed for other conditions as well. For example 
anti-epileptics are also prescribed for chronic pain. Also 
a general practitioner might have a very specific reason 
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for describing a specific drug to a specific person. The 
lower associations between dementia, migraine and 
Parkinson with their tracer medications probably were 
caused by this lack of specificity. 

Future 

In the near future, large databases with all kinds of infor- 
mation will be widely available. Some of these registries 
will contain data of almost the entire population of a 
country or a region. This is a challenging situation for 
the existing databases collecting data from a smaller 
sample population. We will have to consider redefining 
our strategic goals, methods and procedures. 

The option of collecting either blood or buccal smear 
samples of patients in Intego is being explored [27]. 
Together with background characteristics and a full in- 
ventory of recorded morbidity, laboratory results and 
prescribed drugs, this would enable research on gene- 
environment and pharmaco-genetic interactions. This 
can only be performed after strict informed consent and 
with the agreement of the ethical review board, the na- 
tional privacy authorities, and the Flemish authorities 
supervising the Flemish Biobank activities. 

We will continue to explicitly structure quality proce- 
dures with training and coaching, introduce criteria for 
diagnosis (for example GOLD criteria for COPD classifi- 
cation) and list missing data for possible confounders, 
such as smoking and SES. We will check the validity of 
our 3 existing quality criteria and continue to work out 
and implement internal and external validation proce- 
dures. We are convinced that this procedure will pro- 
cure a quality label enabling small registries like Intego 
to occur as a gold standard for large databases with a lot 
of data often extracted from 'bad' registrators. 

Finally, registries like Intego with a stable group of 
practicising general practitioners also offers possibilities 
for targeted, experimental, randomized trials. 

Conclusion 

With 95 general practitioners and data covering over 
two million patient-years, Intego certainly is not one of 
the largest general practice based computerized morbid- 
ity registries in the world. Nevertheless, the database en- 
ables us to present and compare data on background 
information, incidence and prevalence rates, laboratory 
results, and prescribed drugs for all relevant subgroups 
on a routine basis and is unique in Belgium. This results 
from two basic requirements: a group of dedicated gen- 
eral practitioners keeping high quality patient records 
using highly structured software and a procedure that 
does not require them to do anything special, additional 
to this record keeping. 
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